Criteria of GenCall score to edit marker data and methods to handle missing markers have an influence on accuracy of genomic predictions
نویسندگان
چکیده
The aim of this study was to investigate the effect of different strategies for handling lowquality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data contained 1 071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After preliminary editing, 39 227 single nucleotide polymorphism (SNPs) remained in the dataset. Four methods to handle missing genotypes were: 1) BEAGLE: missing markers were imputed using Beagle 3.3 software, 2) COMMON: missing genotypes at a locus were replaced by the most common genotype at this locus observed in the marker data, 3) EX-ALLELE: missing marker genotypes at a locus were treated as an extra allele, and 4) POP-EXP: missing genotypes at a locus were replaced with population expectation at this locus. It was shown that among the methods used in this study, the imputation with Beagle was the best approach to handle missing genotypes. Treating missing markers as a pseudo-allele, replacing missing markers with a population average or substituting the most common alleles each reduced the accuracy of genomic predictions. The results from this study suggest that missing genotypes should be imputed in order to improve genomic prediction. Editing the marker data with a stringent threshold on GenCall scores and then imputing the discarded genotypes did not lead to higher accuracy. All marker genotypes with a GenCall score over 0.15 should be retained for genomic prediction.
منابع مشابه
The Impact of Different Genetic Architectures on Accuracy of Genomic Selection Using Three Bayesian Methods
Genome-wide evaluation uses the associations of a large number of single nucleotide polymorphism (SNP) markers across the whole genome and then combines the statistical methods with genomic data to predict the genetic values. Genomic predictions relieson linkage disequilibrium (LD) between genetic markers and quantitative trait loci (QTL) in a population. Methods that use all markers simultaneo...
متن کاملاهمیت خویشاوندی ژنتیکی و رکورد فنوتیپی بر صحت ژنومی دادههای جانهی شبیه سازی شده با استفاده از مدل های حیوانی در حضور اثرات متقابل ژنوتیپ و محیط
The objective of this study was to investigate the role of genetic relationships between training and validation set with considering different ratio of phenotypic records of training set on accuracy of genomic prediction via animal models containing genotype × environment interactions in simulated imputation data. For this purpose, four different scenarios using 15k density containing differen...
متن کاملEffect of marker density and trait heritability on the accuracy of genomic prediction over three generations
The aim of this study was to determine the effect of marker density, level of heritability, number of QTLs, and size of training set on the genomic accuracy over three generations. Thereby, a trait was simulated with heritability of 0.10, 0.25 or 0.40. For each animal, a genome with 20 chromosomes, 1 Morgan each, was simulated. Different marker densities (2000, 4000 and 6000 markers) and 400 an...
متن کاملEffect of Markers Effect Estimation Methods, Population Structure and Trait Architercture on the Accuracy of Genomic Breeding Values
This study aimed to investigate the effect of the method of estimating the effects of markers , QTLs distribution, number of QTLs, effective population size and trait heritability on the accuracy of genomic predictions. Two effective population sizes, 100 and 500 individuals, were simulated by QMSim software. A 100 cM genome including one chromosome was simulated where 500 SNPs and two diffe...
متن کاملEffects of Marker Density, Number of Quantitative Trait Loci and Heritability of Trait on Genomic Selection Accuracy
The success of genomic selection mainly depends on the extent of linkage disequilibrium (LD) between markers and quantitative trait loci (QTL), number of QTL and heritability (h2) of the traits. The extent of LD depends on the genetic structure of the population and marker density. This study was conducted to determine the effects of marker density, level of heritability, number of QTL, and to ...
متن کامل